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ABSTRACT 

We measure the logarithmic scatter in mass at fixed richness for clusters in the maxBCG cluster catalog, an 
optically selected cluster sample drawn from SDSS imaging data. Our measurement is achieved by demanding 
consistency between available weak lensing and X-ray measurements of the maxBCG clusters, and the X-ray 
luminosity-mass relation inferred from the 400d X-ray cluster survey, a flux limited X-ray cluster survey. We 
find crinM|/v 2 oo = 0-45^o'ig (95% CL) at A^oo ~ 40, where N200 is the number of red sequence galaxies in a cluster. 
As a byproduct of our analysis, we also obtain a constraint on the correlation coefficient between InLx and 
InM at fixed richness, which is best expressed as a lower limit, r LM \ N > 0.85 (95% CL). This is the first 
observational constraint placed on a correlation coefficient involving two different cluster mass tracers. We use 
our results to produce a state of the art estimate of the halo mass function at z = 0.23 — the median redshift 
of the maxBCG cluster sample — and find that it is consistent with the WMAP5 cosmology. Both the mass 
function data and its covariance matrix are presented. 

Subject headings: galaxies: clusters - X-rays: galaxies: clusters - cosmology: observation 



1. INTRODUCTION 

The space density of galaxy clusters as a function of clus- 
ter mass is a well-known cosmological probe (see e.g. Holder 
et al. 2001; Haiman et al. 2001; Rozo et al. 2004; Lima & Hu 
2004), and ranks among the best observational tools for con- 
straining erg, the normalization of the matter power spectrum 
in the low redshift universe (see e.g. Frenk et al. 1990; Henry 
& Arnaud 1991; Schuecker et al. 2003; Gladders et al. 2007; 
Rozo et al. 2007b). 13 The basic idea is this: in the high mass 
limit, the cluster mass function falls off exponentially with 
mass, with the fall-off depending sensitively on the amplitude 
of the matter density fluctuations. Observing this exponen- 
tial cutoff can thus place tight constraints on a%. In practice, 
however, the same exponential dependence that makes clus- 
ter abundances a powerful cosmological probe also renders it 
susceptible to an important systematic effect, namely uncer- 
tainties in the estimated masses of clusters. 

Because mass is not a direct observable, cluster masses 
must be determined using observable mass tracers such as 
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X-ray emission, SZ decrements, weak lensing shear, or clus- 
ter richness (a measure of the galaxy content of the cluster). 
Of course, such mass estimators are noisy, meaning there can 
be significant scatter between the observable mass tracer and 
cluster mass. Since the mass function declines steeply with 
mass, up-scattering of low mass systems into high mass bins 
can result in a significant boost to the number of systems with 
apparently high mass (Lima & Hu 2005). If this effect is not 
properly modeled, the value of erg derived from such a cluster 
sample will be overestimated. 

One approach for dealing with this difficulty is to employ 
mass tracers that have minimal scatter, thereby reducing the 
impact of said scatter on the recovered halo mass function. 
For instance, Kravtsov et al. (2006) introduced a new X-ray 
mass estimator, Yx = M gas Tx, which in their simulations ex- 
hibits an intrinsic scatter of only «8%, independent of the 
dynamical state of the cluster. Use of a mass estimator with 
such low scatter should lead to improved estimates of cr% 
from X-ray cluster surveys (Pierpaoli et al. 2001; Reiprich & 
Bohringer 2002; Schuecker et al. 2003; Henry 2004; Stanek 
et al. 2006). 

Such tightly-correlated mass tracers are not always avail- 
able. In such cases, determination of the scatter in the mass- 
observable relation is critical to accurately inferring the mass 
function and thereby determining cosmological parameters. 
Of course, in practice, it is impossible to determine this scat- 
ter to arbitrary accuracy, but since the systematic boost to 
the mass function is proportional to the square of the scatter 
(Lima & Hu 2005) (i.e. the variance), even moderate con- 
straints on the scatter can result in tight a% constraints. 

In this paper, we use optical and X-ray observations to 
constrain the scatter in the mass-richness relation for the 
maxBCG cluster catalog presented in Koester et al. (2007a). 
Specifically, we use observational constraints on the mean 
mass-richness relation, and on the mean and scatter of the 
Lx - richness relation, to convert independent estimates of the 
scatter in the Lx~M relation into estimates of the scatter in 
the mass-richness relation. An interesting byproduct of our 
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analysis is a constraint on the correlation coefficient between 
mass and X-ray luminosity at fixed richness. To our knowl- 
edge, this is the first time that a correlation coefficient involv- 
ing multiple cluster mass tracers has been empirically deter- 
mined. 

The layout of the paper is as follows. In section 1.1 we 
lay out the notation and definitions used throughout the paper. 
Section 2 presents the data sets used in our analysis. In sec- 
tion 3 we present a pedagogical description of our method for 
constraining the scatter in the richness-mass relation, while 
section 4 formalizes the argument. Our results are found in 
section 5, and we compare them to previous work in section 
6. In section 7, we use our result to estimate the halo mass 
function in the local universe at z = 0.23, the median redshift 
of the maxBCG cluster sample, and we demonstrate that our 
recovered mass function is consistent with the latest cosmo- 
logical constraints from WMAP (Dunkley et al. 2008). A de- 
tailed cosmological analysis of our results will be presented in 
a forthcoming paper (Rozo et al., in preparation). Our sum- 
mary and conclusions are presented in section 8. 

1.1. Notation and Conventions 

We summarize here the notation and conventions employed 
in this work. Given any three cluster mass tracers (possibly 
including mass itself) X,Y, and Z, we make the standard as- 
sumption that the probability distribution P(X,Y\Z) is a bi- 
variate lognormal. The parameters A x \ z , B x \ z , and a x \z are 
defined such that 



(lnX\Z)=A x \ z + a x]z \nZ 
\n(X\Z)=B x \ z + a x \ z \nZ. 



(1) 

(2) 



Note the slopes of the mean and logarithmic mean are the 
same, as appropriate for a log-normal distribution. The scatter 
in \nX at fixed Z is denoted cr x \ z , and the correlation coeffi- 
cient between \nX and lnT at fixed Z is denoted r XY \ z . We 
emphasize that all quoted scatters are the scatter in the nat- 
ural logarithm, not in dex. Note these parameters are simply 
the elements of the covariance matrix specifying the Gaussian 
distribution P(lnX,lnT| InZ). Under our lognormal assump- 
tion for P(X,Y\Z), the parameters A x \ z and B x \ z are related 
via 
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In this work, the quantities of interest are cluster mass M, 
X-ray luminosity L x , and cluster richness N. Unless other- 
wise specified, cluster mass is defined as Msm c , the mass con- 
tained within an overdensity of 500 relative to critical. L x 
is the total luminosity in the rest-frame 0.5-2.0 keV band, 
and N is the maxBCG richness measure A200, the number 
of red sequence galaxies with luminosity above 0.4L* within 
an aperture such that the mean density within said radius is, 
on average, 200S1" 1 times the mean galaxy density assuming 
fl m = 0.3. Likewise, unless otherwise stated all parameters 
governing the relations between M, L x , and N assume that M 
is measured in units of 10 14 Mq, L x is measured in units of 
10 43 ergs/s, and N is measured in "units" of 40 galaxies. For 
instance, including units explicitly, the mean relation between 
cluster mass and richness reads 



(M\N) 



10" M & 



(4) 



A Hubble constant parameter h = 0.71 is assumed through 



out. 14 In addition, the weak lensing data presented in this 
analysis assumed a flat ACDM cosmology with il m = 0.27. 
The recovered mass function has the standard hubble param- 
eter degeneracy. 

2. DATA SETS 

In this work we use the public maxBCG cluster catalog pre- 
sented in Koester et al. (2007a), which is an optically selected 
volume limited catalog of close to 14,000 clusters over the 
redshift range z G [0.1,0.3]. These clusters were found in 
7500 deg 2 of imaging data from the Sloan Digital Sky Sur- 
vey (SDSS, York et al. 2000) using the maxBCG cluster find- 
ing algorithm (Koester et al. 2007b). This algorithm identi- 
fies clusters as overdensities of red sequence galaxies. All 
clusters are assigned a redshift based on the SDSS photomet- 
ric data only, and these redshifts are known to be accurate to 
within a dispersion Az « 0.01. Every cluster is also assigned 
a richness measure A2oo> which is the number of red sequence 
galaxies above a luminosity cut of 0.4L* and within a speci- 
fied scaled aperture, centered on the Brightest Cluster Galaxy 
(BCG) of each cluster. Only clusters with A^oo > 10 are in- 
cluded in the final catalog. Interested readers are referred to 
Koester et al. (2007a) and Koester et al. (2007b) for further 
details. In the interest of economy of notation, from now on 
we denote the maxBCG richness measure simply as N. 

The relationship between cluster richness and various well 
known mass tracers has been studied in large, homogeneous 
samples, such as 2MASS (Dai et al. 2007) and SDSS (Becker 
et al. 2007; Johnston et al. 2007; Rykoff et al. 2008b; Mandel- 
baum et al. 2008b). Of particular interest to us are the weak 
lensing measurements of the mean mass as a function of rich- 
ness, and the X-ray measurements of the mean and scatter of 
the X-ray luminosity as a function of richness. The former 
analysis has been carried out by Johnston et al. (2007) based 
on the weak lensing data presented in Sheldon et al. (2007), 
and independently by Mandelbaum et al. (2008a). In short, 
Sheldon et al. (2007) stacked maxBCG clusters within narrow 
richness bins, and measured the average weak lensing shear 
profile of the clusters. These shear profiles were turned into 
surface mass density contrast profiles using the redshift distri- 
bution of background sources estimated with the methods of 
Lima et al. (2008) and the neural net photometric redshift esti- 
mators described in Oyaizu et al. (2008). Then, Johnston et al. 
(2007) fit the resulting profiles using a halo model scheme to 
obtain tight constraints on the mean mass of maxBCG clusters 
for each of the richness bins under consideration. The Man- 
delbaum et al. (2008b) analysis is very similar in spirit to the 
one described above. The main differences are the way the 
source redshift distribution is estimated, and the details of the 
model fitting use to recover the masses. The differences in the 
results between these two analysis are discussed in appendix 
A. 2, where we use them to set priors on the mass-richness 
relation. 

The measurement of the mean X-ray luminosity of 
maxBCG clusters has been carried out by Rykoff et al. 
(2008b) following an approach similar to that pioneered in 
Dai et al. (2007). The necessary X-ray data is readily avail- 
able from the ROSAT All-Sky Survey (RASS, Voges et al. 
1999). In short, Rykoff et al. (2008b) stacked the RASS pho- 
ton maps (Voges et al. 2001) centered on maxBCG clusters 
in narrow richness bins. The background subtracted stacked 

14 For other values of our weak lensing masses scale as M <x /T 1 and the 
X-ray luminosities as Lx oc h~ 2 . 
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photon counts within a 750 h kpc aperture were used to es- 
timate the mean X-ray luminosity Lx in the 0.1-2.4 keV rest 
frame of the clusters. In addition, Rykoff et al. (2008b) mea- 
sured the scatter in X-ray luminosity at fixed richness by indi- 
vidually measuring Lx for all maxBCG clusters with N > 30. 
It is worth noting that due to the shallowness of RASS, many 
of the maxBCG clusters are not X-ray luminous enough to 
be detected individually. However, non-detection and upper 
limits for Lx for individual systems were properly taken into 
consideration using the Bayesian approach detailed in Kelly 
(2007), and the recovered mean X-ray luminosity from this 
Baysian analysis was fully consistent with the stacked means. 

In addition to the data sets above, we use the constraints 
on the Lx~M relation from Vikhlinin et al. (2008). These 
constraints are based on the 400d cluster X-ray survey, a flux 
limited cluster survey based on ROSAT pointed observations 
with an effective sky coverage of 397 deg 2 (Burenin et al. 
2007). Briefly, Vikhlinin et al. (2008) measured both the to- 
tal soft band X-ray luminosity and the cluster mass for each 
cluster in the sample. X-ray luminosities are estimated from 
ROSAT data, and measure the luminosity in the rest-fram 
0.5-2.0keV band, extrapolated to infinity assuming standard 
(3 profiles. Cluster masses are estimated based on the values of 
Yx derived from followup Chandra observations, though they 
note that the results they obtain using different mass tracers 
such as X-ray temperature and total gas mass are very similar. 
The M-Yx relation is itself calibrated based on hydro-static 
mass estimates. Importantly, Vikhlinin et al. (2008) explic- 
itly correct for the Malmquist bias expected for a flux limited 
cluster sample, so the Lx~M relation they derive can be inter- 
preted as the relation one would obtain using a mass limited 
cluster sample. 

For this work, we have repeated the analysis in Rykoff et al. 
(2008b) with a slightly different definition for Lx- In par- 
ticular, we measure the X-ray luminosity in the rest-frame 
0.5-2.0 keV band within a 1 /r'Mpc aperture. The change in 
band is tailored to match the energy band used by Vikhlinin 
et al. (2008) , which we used to place priors on the Lx~M 
relation. It is worth noting that Vikhlinin et al. (2008) do not 
use a 1 /r'Mpc aperture, as we do. We have, however, care- 
fully calibrated the scaling between our Lx definition and that 
of Vikhlinin et al. (2008) so as to be able to use their results in 
our analysis. end A detailed description of our measurements 
can be found in appendix A.3. 

3. RELATING CLUSTER MASS, X-RAY LUMINOSITY, AND 
RICHNESS 

The problem we are confronted with is the following: we 
have four pieces of observational data, namely 

• The abundance of galaxy clusters as a function of rich- 
ness. 

• The mean relation between cluster richness and mass. 

• The mean and variance of the relation between cluster 
richness and X-ray luminosity. 

• The mean and variance of the relation between cluster 
X-ray luminosity and mass. 

From this data, we wish to determine the scatter in mass at 
fixed richness for the cluster sample under consideration. 

The basic idea behind our analysis is as follows. Consider 
the probability P(M,Lx\N), which we take to be Gaussian in 




0.2 0.3 0.4 0.5 0.6 0.7 0.8 



°MIN 

FIG . 1 . — Contours of constant Lx-M parameters. For each assumed value 
of the scatter cr M ^ N and correlation coefficient parameter r M ^ N , we predict 
the amplitude, slope, and scatter of the Lx—M relation of a mass selected 
sample of clusters with M > 3 X 10 14 Mq. Contours of constant amplitude, 
slope, and scatter are shown with the solid, dashed, and dotted lines respec- 
tively. The thicker lines correspond to the central values of the Lx-M priors 
discussed in appendix A. 4 and summarized in Table 1 , while the the other two 
contours enclose the 95% confidence region for each of the parameters. The 
second slope contour falls to outside the region of parameter space shown 
in the figure. The intersection of the three separate regions correspond to 
acceptable values for the two unknown parameters o"m|w an d r M.L\N- 

InM and InLx- This probability distribution is completely 
specified by the mean and variance of both M and Lx at fixed 
richness, and by the correlation coefficient between M and Lx ■ 
Of these, there are only two quantities that are not already ob- 
servationally constrained: u M \ N , the scatter in mass at fixed 
richness, and r M n N , the correlation coefficient between mass 
and Lx at fixed richness. 

Suppose now that we guessed values for these two quan- 
tities, so that the probability distribution P(M, Lx \N) is fully 
specified. Given the abundance function n(N), we can use 
P(M,L X \N) to randomly assign a mass and an X-ray lumi- 
nosity to every cluster in the sample. We can then select 
a mass limited sub-sample, and measure the corresponding 
Lx -M relation, comparing it to the Lx —M measurement from 
Vikhlinin et al. (2008). Since the Lx~M relation we predict 
depends on our assumptions about P(M,Lx\N), there should 
only be a small region in parameters space where our pre- 
dictions are consistent with independent observational con- 
straints on the Lx~M relation. 

Figure 1 illustrates this idea. To create the figure, we have 
set every observed parameter of the distribution P(M, Lx \N) 
to the central value of the priors described in appendix A and 
summarized in table 1. We then defined a grid in the two 
dimensional space spanned by <Jm\n an d r M.L\N> an d carried 
through the argument described above. The resulting predic- 
tions for the amplitude, slope, and scatter of the Lx~M rela- 
tion as a function of a M \ N and r M L \ N are shown in the figure. 
We plot contours of constant amplitude, slope, and scatter of 
the Lx~M relation as solid, dashed, and dotted lines respec- 
tively. The thicker curves correspond to the central values 
of the priors, while thinner curves demark the corresponding 
95% confidence limits. As we can see, all three contours in- 
tersect in a finite region of parameter space, indicating good 
agreement between our weak lensing and X-ray data, and the 
independent determination of the Lx~M relation. Based on 
Figure 1, we expect a detailed analysis should constrain our 
parameters to a M \ N w 0.40, and r M L \ N m 0.9. The rest of this 
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paper is simply a way of formalizing the argument described 
above in order to place errors on both cr M \ N and r ML \ N . 

4. FORMALISM 

We wish to formalize the above argument in order to place 
quantitative constraints on the scatter in mass at fixed rich- 
ness. Details of how we go about doing so are presented be- 
low. Readers interested only in our results can move directly 
to section 5. 

4.1. Likelihood Model 

As we mentioned above, the key point in our analysis is 
our ability to compute the amplitude and slope of the mean 
relation (\nLx\M), and the scatter about this mean, as a func- 
tion of our two parameters of interest: the scatter in mass at 
fixed richness and the correlation coefficient between M and 
Lx at fixed N. Let us define x = {A L \ Ml a L \ Ml a L \M}, and let 
P = {°m |iV) r M,L|/v} denote our parameters of interest. Our pre- 
dictions for the Lx~M relation as a function of our parameters 
of interest can be summarized simply as x(p). Now, adopting 
a Bayesian framework, a set of priors on x is simply a proba- 
bility distribution P x (x). Since x is a function of p, the priors 
immediately define a probability distribution over p given by 
P(p) = J P x (x(p))det(ax/5p). (5) 
Since we know how to compute both P x (x) and x(p), we can 
find any confidence regions for our parameters of interest. 

The problem we are confronted with, however, is slightly 
more complicated, in that the functions x depend not only 
on p, but also on additional nuisance parameters q. Indeed, 
our predictions for the observable parameters of the Lx~M 
relation depend on both the abundance function of clusters 
and P(M,Lx\N). The abundance function can be accurately 
described by a Schechter function (we explicitly checked a 
Schechter function is statistically acceptable), 

n(N) cx N~ T exp{-N /N* ) . (6) 
Given a Schechter fit, our prediction for the Lx~M relation 
will also depend on the value of the parameters r and N*. 
Likewise, the distribution P(M,Lx\N) also depends on the 
amplitude and slope of the means (M\N) and (Lx\N), as well 
as the scatter in Lx at fixed N. All in all, we have six additional 
nuisance parameters q = {N* , T,B M \ N ,a M \ N ,A L \ N ,a L \ N ,a L \ N }. 
Let r = {p,q} denote the full set of parameters. The priors 
from the Lx —M relation define a probability distribution over 
r given by 

P(r) = P x (x(r))det(<9x/<9r). (7) 

Since we have a total of 8 parameters, and only three observ- 
ables from the Lx~M relation, it is obvious that the above 
likelihood function will result in large degeneracies because 
the parameters are under-constrained. If one has priors Po(q) 
in the nuisance parameters, however, the probability distribu- 
tion P(p) in the parameters of interest is given by 



P(p) = / dqPo(q)P x (x(p,q))dct(dx/dr). 



(8) 



This equation allows us to compute P(p), and therefore place 
constraints on our parameters of interest. In practice, we will 
ignore the determinant term in the probability distribution de- 
fined in equation 8. This is because the function x(r) is es- 
timated using a Monte Carlo approach, implying that accu- 
rate numerical estimates of the Jacobian dx/dr would be too 
computationally intensive to be performed. Fortunately, the 
determinant typically introduces only slight modulations of 
the likelihood, so we do not expect our results to be adversely 
affected by this. 



4.2. Implementation 

We estimate the probability distribution P(p) using a Monte 
Carlo approach. Ignoring an overall normalization constant 
and setting det(<9x/9r) = constant, we have 



1 



N dn 



■E 



P x (x(p,q,)) 



(9) 



where q, for i = 1 through Ndraws are random draws of the nui- 
sance parameters q,, drawn from the prior distribution Po(q/)- 
We set Ndraws = 3000 as our default value (see below for fur- 
ther discussion). 

The prior distributions for our nuisance parameters are 
characterized by a statistical and a systematic error. The for- 
mer is modeled as Gaussian and the latter using a top-hat dis- 
tribution. Thus, given a prior of the form 



stat _i_ ^sys 

q ±tr/' 



(10) 



q = q±cr 

a random draw is obtained by setting 

q, = q + Aqf flf + Aqf'' 5 (11) 

where Aqf is drawn from a Gaussian of zero mean with a 
covariance matrix defined by the statistical errors, and Aqf s 
is drawn from a top hat distribution that is non-zero only for 

The probability distribution P x (x(p,q)) used in equation 9 
is the product of the likelihoods P x (x(p,q)) for each of the 
Lx —M parameters x 6 x = {A L \ M , a L \ M , cr L | M }. The probability 
for each Lx~M parameter is given by the convolution of the 
top-hat and Gaussian distributions defined by the statistical 
and systematic errors of x, so that 



AWp,q)): 



1 



4a 



f? [erf(jc + )-erf(jc_)] 



where 



x± = 



±o f'-(*(p,q)-*) 

V2erf flf 



(12) 



(13) 



Note that the above equations are appropriate only when the 
various Lx~M parameters are uncorrected, so it is important 
to place the priors at the pivot point of the Lx~M relation 
(M phot = 3.9 x 10 14 Mo). This explains why Table 1 quotes 



1 pivot 

a prior on A L | M + 1.361 a L \ M + 1.5(o^| M -0.40 2 ) rather than on 
A L \ M alone. 

We also need to specify how the function x(p,q) is eval- 
uated. We do this using a Monte Carlo approach. Given p 
and q, we generate N c i = 10 5 mock clusters in the richness 
range N <G [10, 200]. We then randomly draw mass and X-ray 
luminosity values for each of these clusters based on the dis- 
tribution P(M, Lx |A0, and select a mass limited subsample of 
clusters using a mass cut M > M min with M min = 3 x 10 14 M Q 
(the reason for this particular value is explained below). Using 
a least squares fitting routine, we find the best fit line between 
InLx and InM. This defines both A L | M (p,q) and a L | M (p,q). 
The scatter <T L i M (p, q) is defined as the root mean square fluc- 
tuation about the best fit line. 

Using equation 9 and the function x(p,q) defined above, 
we evaluate the probability distribution P(p) along a grid of 
points in a M \ N € [0.2,0.85] and r € [0.75,1.0] with 25 grid 
points per axis. A full run of our code then requires we per- 
form 25 2 Monte Carlo integrals with Ndraws = 3000 points in 
each integration. Each draw also requires us to evaluate the 
function x(p,q), which in turn requires generating a mock 
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TABLE 1 

Scaling Relation and Cluster Abundance Priors 



Parameter 


Prior 


In TV* 


3.66 ±0.10 (sfaf)=t0.01 (.vys) 


T 


2.61 ± 0.06 {stat) ± 0.05 (sys) 


Bm\n 


0.95 ± 0.07 (.vraf ) ±0.10 {sys) 


a M\N 


1 .06 ± 0.08 (stat) ±0.08 (sys) 


B L\N 


1.91 ± 0.04 (stat) ± 0.09 (sys) 


a L\N 


1.63 ± 0.06 (stat) ± 0.05 (sys) 




0.83 ± 0.03 (.vraf ) ±0.10 (sys) 


A L \M + 1.361a m + 1.5(o : * m -0.4&) 


2.45 ± 0.08 (stat) ± 0.23 (sys) 


a L\M 


1.61 ±0.14 (stat) 




0.40 ± 0.04 (Wat) 



Priors on the abundance function parameters (N* and r), as well as those 
from the M—N and Lx —N relations are not taken directly from any single work 
in the literature, but are discussed in detail in Appendix A. Priors on the Lx —M 
relation are taken from Vikhlinin et al. (2008). Overall, we believe these priors 
are fair, that is, they are neither overly optimistic nor overly pessimistic. 



catalog with 7V C / = 10 5 clusters, so the procedure as a whole 
is computationally expensive. To increase computational effi- 
ciency, for each Monte Carlo evaluation of f(p) we generate 
a single cluster catalog that is used to estimate the likelihood 
at every grid point. This correlates the values of P along our 
grid, but does not otherwise adversely affect our results. 

Our Monte Carlo approach requires that both the number of 
clusters in the random catalogs N c i and the number of times 
the likelihood function is evaluated Ndmws is sufficiently large 
to achieve convergence. Our default values for 7V C ( and Ndmws 
were selected to ensure the recovered likelihood is accurate 
to within a dispersion of ~ 1-2% inside high likelihood re- 
gions. The error in the recovered likelihood increases with 
decreasing likelihood, but even in the tails of the distributions 
our estimates are accurate to about 10%. This was explic- 
itly tested by running a coarse grid with our default values 
for Ndmws and N c i, and by repeating the analysis with both of 
these parameters increased by a factor of two. 15 

Finally, we emphasize that it is necessary to explicitly check 
whether our results are sensitive to the N > 10 cut applied to 
the maxBCG clusters sample. In particular, when selecting 
a mass limited subsample of clusters, we need to ensure that 
the mass limit M„„„ be sufficiently large that the number of 
clusters with N < 10 and M > M„,j n is insignificant. We have 
explicitly checked that for our adopted low mass cut M mi „ > 
3 x 10 14 Mq our results are robust to the richness cut N > 
10 by repeating the analysis in a coarse grid using an N > 8 
richness cut instead. We find that the likelihood estimates in 
both cases are in agreement to within the expected accuracy 
of our Monte Carlo approach. 

4.3. Priors 

The priors used in our analysis are summarized in Table 1 . 
We follow the notation 

q = q±o**(stat)±o» , (sys) (14) 

where q is the central value, at tat is the ler statistical error 
on the parameter q marginalized over all other parameters, 

15 It is worth noting that in order to create Figure 1, one needs to generate 
cluster catalogs with N c i > 10 7 clusters in order for the contours to appear 
smooth by eye. However, N c / = 10 5 is a sufficient number of clusters for 
our analysis, since we only require that the noise in the likelihood be much 
smaller than the width of the priors. Since the latter are quite wide, even 
relatively noisy estimates of the Lx—M relation are sufficient for constraining 
the marginalized distribution. 
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FIG. 2. — 68% and 95% confidence contours for (Jm\n an d t lm\n- Solid 
lines show the results of our analysis. We find that X-ray luminosity and 
mass are correlated at fixed richness. The breadth of the degeneracy region 
shown above is almost exclusively due to uncertainties in the Lx~M rela- 
tion parameters. Dashed contours demonstrate how our results would im- 
prove if the Lx—M amplitude and slope were known to within an accuracy 
of AA L]M = Aq L | M = 0.05. 

and a s c J s is the systematic error. In all cases, we model sta- 
tistical errors as Gaussian, and we include known covariances 
between different parameters. Systematic errors are assumed 
to follow top-hat distributions, and the final prior distribution 
is given by the convolution of these two functions. 

We believe that the priors contained in table 1 are fair, that 
is, they are neither overly aggressive nor overly conservative. 
A detailed discussion of our priors can be found in appendix 
A. 

5. RESULTS 

Figure 2 shows the 68% and 95% probability contours for 
the parameters <jm\n an d r M.L\N- The likelihood peak occurs 
at <Jm\n = 0.46 and r M L \ N = 0.90. The marginalized means are 
(a M]N ) = 0.45 and (r M . L \ N ) = 0.91. 

We wish to determine whether the breadth of the likelihood 
region in Figure 2 is limited by uncertainties in the scaling re- 
lations of maxBCG clusters, or by uncertainties in the Lx —M 
relation. To do so, we repeat our analysis with two new sets 
of priors: for the first, we use a tight 0.05 statistical prior on 
all nuisance parameters, but let the Lx~M parameters float. 
The second set of priors uses a tight 0.05 prior on each of 
the Lx~M parameters, but floats all other nuisance parame- 
ters with the original priors. We find that using tight priors on 
our nuisance parameters has negligible impact on the likeli- 
hood regions recovered from our analysis. On the other hand, 
the confidence regions obtained with the tight Lx~M priors, 
shown in Figure 2 as dashed curves, are tighter than those de- 
rived from our original analysis. Thus, the dominant source 
of error in our analysis is the uncertainty in the values of the 
Lx -M parameters. This can be easily understood based on 
Figure 1 . We can see from the figure that the uncertainty in 
r M.L\N i s largely due to the prior on the scatter in Lx at fixed M, 
which is already tight and thus does not change between our 
fiducial prior and our tight priors. On the other hand, we can 
see that both the amplitude and slope priors cut-off regions 
with high scatter. Tightening these priors excludes a larger 
section of parameter space, and results in the tighter contours 
observed in Figure 2. 

Figure 3 shows the marginalized probability distributions 
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for om\n an d r M,L\N- The solid curves correspond to our origi- 
nal analysis, while the dashed curves illustrate the results one 
expects assuming our hypothetical tight priors for the Lx~M 
relation parameters. We find that the logarithmic scatter in 
mass at fixed richness and the correlation coefficient between 
InM and InLx are 

<7 M |tf = 0.45!^(95%CL) (15) 
r LiM p v >0.85(95%CL). (16) 

Assuming our hypothetical tight Lx —M priors, the constraints 
become a M]N = 0.42+^ and r LM \„ > 0.85 (95% CL). We 
emphasize that these latter constraints are only meant as a 
guide to the accuracy one could achieve with this method if 
the Lx~M relation were known to about 5% accuracy. 

It is evident from our results that cluster richness is not as 
effective a mass tracer as X-ray derived masses. Indeed, even 
total (i.e. not core-core excluded) X-ray luminosity is a more 
faithful mass tracer than the adopted richness measure of the 
maxBCG catalog, as demonstrated both by the smaller scatter 
and the very large correlation coefficient. Note that the latter 
indicates that, at fixed richness, over-luminous clusters are al- 
most guaranteed to also be more massive than average. This 
is an important result which forms the basis for a concurrent 
paper in which we improve our richness estimates by demand- 
ing tighter correlations in the L^- richness re i a tion (Rozo et al. 
2008). 

6. COMPARISON TO OTHER WORK 

There are not many previous results against which our mea- 
surements of scatter in mass at fixed richness may be com- 
pared. One possible reference point is the upper limit based 
on the error bar in the weak lensing mass estimates of John- 
ston et al. (2007). More specifically, assuming that the er- 
ror in (M\N) is entirely due to the intrinsic scatter in M 
at fixed N, it follows that the error in the mass is simply 
AM / (M\N) w A InM = a M \ N /s/n(N) where AM is the ob- 
served error and n(N) is the number of clusters with richness 
N. For the richest bin, which provides the tightest constraint, 
Johnston et al. (2007) find (M) = (8.1 ± 1.3) x 10 14 M . The 
bin contains n = 47 clusters, so an upper limit to the scatter in 
mass at fixed richness is a M \ N < ^/n(AM/ (M)) = 1.10. Fig- 
ure 3 shows that our results easily satisfy this upper limit on 
the scatter. 

The only other measurement of the scatter in mass at fixed 
richness for maxBCG clusters is that found in Becker et al. 
(2007). These scatter estimates are obtained as follows: first, 
Becker et al. (2007) select all maxBCG clusters whose central 
galaxy has a spectroscopic redshift. They then bin the clusters 
in richness, and compute the velocity relative to the BCG of 
every galaxy member with spectroscopic data. The recovered 
velocity distribution of galaxies is found to be non-Gaussian. 
Assuming that the velocity distribution of galaxies of halos 
of fixed mass is exactly Gaussian, and that the observed non- 
Gaussianity is entirely due to mass-mixing within a richness 
bin, Becker et al. (2007) estimate the scatter in mass at fixed 
richness based on the observed non-Gaussianity of the veloc- 
ity distribution. 

An updated version of the results from Becker et al. (2007) 
can be seen in Figure 4. The only difference between this 
plot and the corresponding figure in Becker et al. (2007) is 
that here we have made used of the additional spectroscopic 
data from the SDSS Data Release 6 (Adelman-McCarthy et al. 
2008), which results in tighter error bars. Also shown in the 
figure as a horizontal band is the 95% confidence region from 
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FIG. 3. — Likelihood distributions for u M ^ N and r ML \ N . The distributions 
are marginalized over all other parameters. Solid lines are the results of our 
analysis, while dashed lines are the results obtained assuming tight priors on 
the Lx~M parameters Note the latter set of curves are presented only to give 
a sense of how our result would improve with better understanding of the 
Lx~M relation. 

our analysis. As we can see, our scatter estimate appears to 
be systematically lower than that of Becker et al. (2007), a 
discrepancy first noted in Rykoff et al. (2008a, more on the 
relation between our work and theirs below). 

Such a bias is not entirely unexpected, as we now know 
that a significant fraction of cluster have their BCGs miss- 
identified, a problem that was not yet known - and was there- 
fore unaccounted for - at the time the Becker et al. (2007) 
results came out. To get a better understanding of how 
our results and those of Becker et al. (2007) compare, we 
can use our results along with the miscentering probability 
model from Johnston et al. (2007) to predict the scatter that 
Becker et al. (2007) observed given this miscentering sys- 
tematic. We proceed as follows. First, we use our best fit 
model for the abundance distribution to generate a mock cat- 
alog with 2 x 10 5 clusters with N >\0. Each of these clus- 
ters is assigned a mass by drawing from the P{M\N) distribu- 
tion defined by the values of <Jm\n corresponding to the two 
95% confidence limits on (Jm\n- These assigned masses are 
then turned into velocity dispersions using the scaling rela- 
tion from Evrard et al. (2008). 

At this point, we have a cluster catalog where each cluster 
has a richness and a velocity dispersion. If a cluster is mis- 
centered, we expect that in most cases the new center will be 
a cluster galaxy. Assuming this is the case, and that BCGs are 
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FIG. 4. — Comparison of the scatter in mass at fixed richness estimated in 
this work (solid band) and that of Becker et al. (2007) (diamonds with error 
bars). The dashed band shows how the scatter we measured is expected to be 
affected by miscentering, which allows us better compare our results to those 
of Becker et al. (2007). We find that, once miscentering is properly taken into 
account, the two results appear to be in reasonable agreement. 

at rest at the center of a cluster, the velocity dispersion of clus- 
ter galaxies relative to random satellites will be a factor of \/2 
high than relative to the BCG. Using the miscentering model 
described in Johnston et al. (2007) for p(N), the probability 
that a cluster of richness N be correctly centered, we randomly 
label clusters as properly centered or miscentered, and boost 
their "observed" velocity dispersion for those clusters labeled 
as miscentered by the expected amount. The clusters are as- 
signed a new mass based on their "observed" velocity dis- 
persions, and the corresponding scatter in the M-N relation 
is estimated. We repeat this procedure 10 3 times in order to 
compute the mean systematic correction due to miscentering. 

Our predictions for the scatter values observed by Becker 
et al. (2007) are shown in Figure 4 with dashed lines, and cor- 
respond to the 95% confidence interval from our analysis. We 
see that miscentering introduces a richness dependent correc- 
tion that boosts the scatter in the recovered velocity dispersion 
and places it in significantly better agreement with the data 
from Becker et al. (2007). 

The agreement with the Becker et al. (2007) data is an in- 
teresting result. Perhaps the single most difficult systematic 
effect that had to be addressed in the Becker et al. (2007) anal- 
ysis is the validity of the assumption that non-Gaussianities in 
the velocity distribution of stacked clusters are entirely due 
to mass-mixing is a valid. The reasonable agreement between 
our results and those of Becker et al. (2007) suggests that their 
assumption is indeed justified, though a robust conclusion will 
have to wait until a more detailed analysis is performed, es- 
pecially given the possibility of velocity bias of the galaxy 
population (i.e. if satellite galaxies have a velocity dispersion 
different from that of the dark matter). 

The analysis in this work is also very closely related to that 
of Rykoff et al. (2008a). Rykoff et al. (2008a) sought to con- 
strain the Lx~M relation of clusters by fitting the scaling of 
(Lx\N) with (M\N). However, as recognized in Rykoff et al. 
(2008a), in order to fully interpret their result in terms of the 
traditional definition of the Lx~M relation, i.e. the mean X- 
ray luminosity at fixed mass, one needs to know both the scat- 
ter in mass at fixed richness, and the corresponding correla- 
tion coefficient with Lx- Given that these two quantities are 
unknown, but that the Lx~M relation is already constrained 



from X-ray surveys, it seems reasonable to suggest that a bet- 
ter use of the lensing and X-ray data of maxBCG clusters is to 
use our knowledge of the Lx —M relation to constrain the scat- 
ter in mass at fixed richness and the corresponding correlation 
coefficient, as was done in this work. 

Our work differs from the ideas presented in Rykoff et al. 
(2008a) in another significant way. While our analysis em- 
ploys only P(L X ,M\N) and n(N), Rykoff et al. (2008a) used 
the halo mass function dn jdM and the probability distribution 
P(Lx,N\M) to interpret their measurements. This has the im- 
portant drawback that in doing so, one needs to assume a cos- 
mological model in order to compute the halo mass function, 
rendering their interpretation cosmology dependent. By fo- 
cusing on the quantities that are directly observable, i.e. n(N) 
and P(Lx,M\N), we are able to avoid this difficulty. The price 
we pay for this is that rather than constraining the scatter in 
richness at fixed mass, which is the more directly relevant 
quantity from a cosmological perspective, we constrain in- 
stead the scatter in mass at fixed richness. While this makes 
implementing such a constraint a little more cumbersome in a 
cosmological analysis, the fact that the constraint itself is cos- 
mology independent is obviously of paramount importance. 

7. COSMOLOGICAL CONSEQUENCES 

As mentioned in the introduction, to obtain an unbiased es- 
timate of the halo mass function based on the observed cluster 
richness function requires that we understand the scatter be- 
tween cluster richness and halo mass. Given our lognormal 
assumption, and the fact that the mean mass-richness relation 
is already known from weak lensing, our measurement of the 
scatter in this scaling relation fully determines the probability 
distribution P(M\N). Thus, we are now in a position to de- 
termine the halo mass function of the local universe with the 
maxBCG cluster catalog. 

Let us define then n, = «(M,) as the number of halos within 
a logarithmic mass bin of width AlnM centered about M,, 



dn 



dlnM 



AlnM. 



(17) 



Given our cluster catalog and P(M\N), we construct an esti- 
mator h; for by randomly drawing a mass from P(M\N) for 
each halo in the cluster catalog, and then counting the number 
of halos within the logarithmic mass bin centered about M, . 
Note that since the mass of each cluster is a random variable, 
our mass function estimator h\ is itself a random variable. The 
mean and correlation matrix of n, can easily be obtained by 
making multiple realizations of «,, and averaging the result- 
ing mass functions. 

In practice, we also need to marginalize our results over 
uncertainties in P(M\N) and over uncertainties in the richness 
function n(N). To do so, we randomly draw the parameters x = 
{B M \ N , aM|jv,0M|A'}, an d then resample of the cluster richness 
function to obtain a new estimate of «,. The whole procedure 
is iterated 10 s times, and the mean and covariance matrix of 
the number counts in each of our logarithmic mass bins is 
computed. 16 

Figure 5 shows the mass function recovered through our 
analysis. To turn our number counts into a density, we as- 
sumed a WMAP5 cosmology (Dunkley et al. 2008), with 
n„, = 0.27 and h = 0.72, and a photometric redshift error 

16 We again checked explicitly that the mass cut M,„,„ = 3 X 10 14 Mq is 
large enough for our results to be insensitive to the maxBCG richness cut 
N> 10. 
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FIG. 5. — The maxBCG mass function. Cluster counts were converted to 
densities assuming Q„, = 0.27 and h = 0.71, the same cosmology assumed in 
the lensing measurements (Johnston et al. 2007). The error bars shown are 
due to the scatter in the mass-richness relation, and are strongly correlated. 
For comparison, we have also plotted the Tinker et al. (2008) mass function 
corresponding to the WMAP5 95% confidence region for erg, 0.724 < erg < 
0.868. All other parameters are held fixed to the central values reported in 
Dunkley et al. (2008). Our data are consistent with the WMAP5 results, 
though they might suggest a slightly higher power spectrum normalization. 

Az = 0.01 (used for computing the effective volume of the 
sample). The diamonds correspond to our estimated means, 
and the error bars are the square root of the diagonal elements 
of the correlation matrix. We emphasize that the error bars 
are heavily correlated. The mean and covariance matrix of 
the recovered halo mass function can be found in Appendix 
B. 

Also shown in Figure 5 with dotted lines are the halo mass 
functions at z = 0.23 predicted by WMAP5 assuming the Tin- 
ker et al. (2008) mass function. For both curves, we set 
all cosmological parameters to the central values reported in 
Dunkley et al. (2008), except for erg, which is set to erg = 0.868 
for the upper curve and a% = 0.724 for the lower curve. These 
two values define the 95% confidence interval for a% in Dunk- 
ley et al. (2008). As we can see, the mass function recovered 
from our analysis is fully consistent with the WMAP5 cos- 
mology, though it seems to push for values of <j% on the high 
end of their allowed region. A detailed cosmological analysis 
of our data will be presented in a subsequent paper (Rozo et 
al, in preparation). 

8. SUMMARY AND CONCLUSIONS 

We have shown that by combining the information in the 
maxBCG richness function, the mean richness-mass relation, 
the mean and scatter of the L^- richness relation, and the mean 
and scatter of the Lx~M relation, we can constrain both the 
scatter in mass at fixed richness for maxBCG clusters, as well 
as the correlation coefficient between mass and Lx at fixed 
richness. We find 



a M{N = 0.45t is (95% CL) 
r L .M\N> 0-85 (95% CL). 



(18) 
(19) 



These constraints are dominated by uncertainties in the Lx~M 
relation, and can be significantly tightened if our understand- 
ing of the Lx~M relation improves. We also found our results 



are consistent with those presented in Becker et al. (2007) 
once miscentering of maxBCG clusters is taken into account. 

Our lower limit on the correlation between M and Lx at 
fixed richness constitutes the first observational constraint on 
a correlation coefficient involving two different halo mass 
tracers. Note that the large correlation between Lx and M 
implies that Lx - even without core exclusion - is a signifi- 
cantly better mass tracer than the maxBCG richness estimator 
(i.e. at fixed richness, over-luminous cluster are nearly always 
more massive). This is an important result, which we use in 
a concurrent paper to help us define new richness estimators 
that are better correlated with cluster mass (Rozo et al. 2008). 

Using our results, and assuming f2,„ = 0.27 and h = 0.71, 
we have estimated the halo mass function at z = 0.23, corre- 
sponding to the median redshift of the cluster sample. We 
find that our recovered mass function is in good agreement 
with the mass function predicted by Tinker et al. (2008) for 
the WMAP5 cosmology (Dunkley et al. 2008). A detailed 
cosmological analysis will be presented in a forthcoming pa- 
per (Rozo et al, in preparation). 

Our work sheds new light on the interrelationship of bulk 
properties of massive halos. We have used weak lensing, X- 
ray luminosities, and optical richness estimates to constrain 
the scatter in the richness-mass relation, which can lead to 
improved cosmological constraints. In principle, one could 
also turn this question around, and, assuming cosmology, 
we could constrain the scatter in the richness-mass relation, 
which would then allow us to place constraints on the am- 
plitude, slope, and scatter of the Lx~M relation. Such an 
analysis would be interesting in that, by doing so, one could 
compare the predicted amplitude of the Lx —M relation to 
that derived from hydrostatic mass estimates, thereby directly 
probing the amount of non-thermal pressure support in galaxy 
clusters. Note that even though this question can also be di- 
rectly addressed by comparing weak lensing and X-ray mass 
estimates of individual clusters, the analysis suggested here 
would benefit from having small uncertainties, whereas pro- 
jection effects result in rather noisy weak lensing mass esti- 
mates for individual systems. 
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APPENDIX 
PRIORS 

Abundance Priors 

Our estimates of the Lx~M parameters depend on the abundance function of maxBCG clusters, which is observationally 
determined, but not known to infinite precision. Here, we fit the observed abundance function using a Schechter function, such 
that the mean number of clusters /i of richness N is 

fi(N) = n Q (N/40)- T exp(-N/N Sf ). (Al) 

The amplitude «o is chosen such that the total number of clusters exactly equals the observed number of clusters. We set this 
normalization condition because we are interested only in the shape of the richness function, and not in its amplitude. 

The fits are done by maximizing the likelihood of the observed distribution, binned in bins of width A^V = 1 . We assume that 
the probability distribution of observed n clusters in a bin of richness N is Poisson, with 

P(n) = V ^ . (A2) 

n! 

For numerical purposes, we cut the distribution at N max = 300, which is sufficiently large to not affect our fits. We emphasize that 
we use the above likelihood only to define estimators for N* and r, since, as discussed below, both goodness of fit and errors in 
the parameter estimation are obtained through Monte Carlo simulation. 

The richness distribution is fit over the range N > 10 by maximizing the log-likelihood function using an amoeba routine. To 
estimate our errors, we follow a Monte Carlo approach and resample the observed richness function 10 4 times. We find that the 
parameters N* and r are significantly correlated, with the probability distribution being Gaussian in r and InAf*. The best fit 
parameters are 

(lnAg=3.66±0.10 (A3) 
(r) =2.61 ±0.06 (A4) 

with a correlation coefficient 

7^=0.94. (A5) 

To assess goodness of fit, we generate 10 4 mock catalogs with as many clusters as the real data from the probability distribution 
specified by (ln/Y») and (r). We compute the likelihood for each of these mock catalogs, and compare the corresponding 
likelihood distribution to that observed in the real data. We find that our fit is statistically acceptable. 

The most significant systematic error affecting our measurements of the shape of the richness function are completeness and 
purity variations in the cluster catalog. Rozo et al. (2007a) have shown that the maxBCG catalog is over 90% pure and complete 
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for N > 10. Here, we take a conservative approach, and consider the change in the best fit parameters assuming the observed 
counts are rescaled by a completeness/purity correction factor A given by 

A = min{0.9 + 0.11n(A7l0.0)/ln(10.0)}. (A6) 

This corresponds to a 10% decrease in the observed counts atN= 10 while holding the counts atN= 100 constant. Upon refitting 
the data after this correction we find systematic offsets 

(AlnjV*). svs = 0.01 (A7) 
(At) jv , = 0.05 (A8) 

which we adopt as our systematic error. Note the systematic offsets are allowed to be both positive and negative, since the 
correction multiplier A above could easily be larger than unity rather than smaller than unity. 

M-N Priors 

Our priors on the M-N relation are based on the results presented in Johnston et al. (2007), Mandelbaum et al. (2008b), 
and Mandelbaum et al. (2008a). To assign our priors, we first compare the results of these two works as a means of assessing 
systematic uncertainties in the mass parameters. We then focus exclusively on the Johnston et al. (2007) results to place our final 
priors on the M-N relation. The latter choice reflects the fact that Johnston et al. (2007) report weak lensing mass estimates for 
several mass definitions, among them Msm c , the relevant quantity in the Lx~M relation of Vikhlinin et al. (2008). 

Let us then begin by discussing the Johnston et al. (2007) results first. While Johnston et al. (2007) quote a power-law fit for 
the mean mass at fixed (M\N), this fit is based a non-public version of the maxBCG catalog that extends to a richness of N = 3 
(the catalog for clusters with < 10 is not public). Since the maxBCG catalog is only known to be highly complete and pure in 
the range N > 10, we have refit the Johnston et al. (2007) masses restricting ourselves to the range N >9. This slightly lower cut 
is necessary due to the richness binning in Johnston et al. (2007). We find that the mass Migob within a 180 overdensity threshold 
relative to mean matter density is 

(M m)h \N) _ __ /A ^ _j_ A A ^ /sr /om 1.18±0.09 



= exp(0.25±0.07)(Af/20) 1IS±uuy (A9) 



10 14 /t'Mq 

with a correlation coefficient r = -0.43 between the amplitude and slope parameters. 

Mandelbaum et al. (2008a) preformed a similar but independent weak lensing analysis of the maxBCG clusters, though using 
M2006 as their mass variable. They find 

^7 1 l ^ ) o =exp(0.45±0.08)(jV/20)' 15±0 ' 4 (A10) 

To compare against the Johnston et al. (2007) values, we use the Hu & Kravtsov (2003) mass conversion formulae to find an 
approximate power law relation between M2oob and M\%qi, over the range 5 x 10 14 M Q < M 200 b < 10 15 M Q . We find M mb = 
1.022M2ook, which is only a 2% correction. Applying this correction, we find that the corresponding M-N parameters from 
Mandelbaum et al. (2008a) are 

(Mmb\N) , n ^^ nnaWAr/om i.i5±o.i4 



10 14 /j-'M, 







= exp(0.47 ± 0.07)(Af/20) 115±U14 (Al 1) 



We find that the slopes of the Johnston et al. (2007) and Mandelbaum et al. (2008a) results are nearly identical, but that the 
masses of Mandelbaum et al. (2008a) are systematically higher by w 25%. This difference can be traced back to how the lensing 
critical surface density for each of the two works is estimated. 

In general, lensing masses are proportional to the quantity 1/ (p~ x rit ), where £„,, is the lensing critical surface density, and 
the average is to be computed over the source redshift distribution. Given multi-band photometric data m for each galaxy, 
one way to compute is to use a photometric redshift estimator z ;j /j 0f0 (ni), and then assume that the true source redshift 

distribution is identical to the photometric redshift distribution. Mandelbaum et al. (2008b) have shown that such a simple 
approach typically results in biased lensing mass estimates, but they also demonstrate that it is possible to achieve unbiased 
results using the probability distribution P(z\m). 

The weak lensing analysis in Sheldon et al. (2007), on which the results from Johnston et al. (2007) are based, falls somewhere 
in between these two approaches. While Sheldon et al. (2007) does in fact make use of photometric redshifts, they do not 
simply assume that the source redshift distribution is identical to the photometric redshift distribution. Rather, they construct a 
probability distribution P(z|z p /, , ), and use this probability to estimate CE~* it \ As it turns out, evaluating (X~!, 7 ) in this way leads 
to results that are nearly identical to those obtained by simply setting z = Zphoto- Thus, even though the approach used in Sheldon 
et al. (2007) is more sophisticated than the simple case considered in Mandelbaum et al. (2008b), we expect the Sheldon et al. 
(2007) results to be biased but correctable as prescribed in Mandelbaum et al. (2008b). This correction amounts to a boost of the 
lensing masses by a factor of 1.18 ± 0.02 (st at) ±0.02 (sys). The statistical error bar in the correction is added in quadrature to 
the statistical error bar from our fit, which results in 



lO^T'M = 



These new values for the Johnston et al. (2007) data are in considerably better agreement with those of Mandelbaum et al. (2008a). 
There remains, however, a systematic 5% difference between the two amplitudes, as well as a small difference Aa M \N = 0.028 
between the two slopes. 
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A possible culprit for this systematic 5% offset is the difference in how miscentering is accounted for in the data models. The 
word miscentering refers to the fact that when finding clusters, one will inevitably find clusters that are improperly centered, either 
due to failures of the cluster finding algorithm, or simply because there is no obvious center of the cluster based on its optical 
image. Such offsets between the true and assigned centers are problematic because if a cluster is miscentered, the corresponding 
lensing signal is weakened, resulting in systematically low mass estimates. 

To determine whether the remaining offset between Mandelbaum et al. (2008a) and Johnston et al. (2007) is consistent with 
differences in the miscentering model, we refit our data assuming no errors on the miscentering corrections. We find 

(M mb \N) =exp(0.42±0.04)(A720) 1 ' I7±0 - 07 . (A13) 

with a correlation coefficient r = -0.15. Note that these errors are smaller than the errors quoted before, as they should be, 
given that this new fit does not marginalize over a wide range of miscentering models. By subtracting the two sets of errors in 
quadrature, we find that the miscentering priors adopted in Johnston et al. (2007) correspond to an error 0.043 in the amplitude 
and 0.05 in the slope. Thus, the Mandelbaum et al. (2008a) mass measurements are well within the centering error included in 
the analysis of Johnston et al. (2007). 

Nevertheless, it is unclear whether miscentering can in fact account for the difference between the Johnston et al. (2007) and 
Mandelbaum et al. (2008b) results. More specifically, Mandelbaum et al. (2008a) also performed their analysis including the 
Johnston et al. (2007) model for miscentering, and find after applying the centering correction their best fit M\%qi,-N relation 
becomes 

(M mh \N) =exp(0.53±0.07)(A720) 1 ' 08±0 ' 14 (A14) 

Comparing this to equation A12, we find including a miscentering correction in the Mandelbaum et al. (2008a) analysis increases 
the tension between the two results. Moreover, it suggests that the difference between the two results is due to some other form 
of systematic difference between the two analysis pipelines. In light of this, we opt for introducing a systematic correction to the 
Johnston et al. (2007) results of +0.06 and -0.05 for the amplitude and slope respectively. We also introduce systematic errors of 
the same magnitude as this systematic correction, so that our final result is 

(M mh \N) = [exp(0.48 ± 0.07 (stat) ± 0.06 (^))](^V/20) L13±0 ' 09 ("<"> ±0 05 &>»>. (A15) 

Note the central values of the original Johnston et al. (2007) analysis (corrected for photometric redshift bias) as well as the 
Mandelbaum et al. (2008a) results both with and without miscentering corrections are all encompassed by our systematic error. 

Now, in this work we are interested more in the M5oo c -N (henceforth simply M-N) relation than in the M2oo c -N relation, 
since it is the former mass which is accessible to X-ray studies. To constrain the M-N relation we use the quoted Msqq c mass 
measurements from Johnston et al. (2007), re-scaling their M200C errors to Myx, c by assuming the relative errors are constant. A 
fit to the data results in 

( M \ N ) _™_m*o^nn™*rW.ll±0.08 



10 14 M, 



o 



= exp[0.68±0.07](A?/40) 111±uus (A16) 



with a correlation coefficient r = 0.45. 

We now boost this expression by factor 1.18 due to the photometric redshift bias correction, and add the systematic corrections 
+0.06 and -0.05 to the amplitude and slope respectively as per our discussion of the M^oi,-N relation. We also include a 
systematic error on the amplitudes and slopes of this same magnitude. We obtain 

B M \ N = 0.91 ±0.07 (if at) ±0.06 (sys) (A17) 
a MW = 1 .06 ± 0.08 {stat ) ± 0.05 (sys). (A18) 

The final systematics we consider here are the purity and completeness of the sample. Now, as long as the completeness is 
not correlated with mass, completeness should not in any way bias the recovered parameters of the M-N relation, though it 
obviously affects the error bars due to lower statistics. 

The same cannot be said of purity. If only a fraction p of the clusters are actually good matches to real halos in the universe, 
then a fraction 1 —p of the clusters will have a lensing signal that is significantly different from the mean signal. As an extreme 
case, we can consider what happens if a fraction 1 —p of the clusters had no mass associated with them. In that case, the observed 
mean mass is simply M \, s = M true j p where M, rue is the true mean, so one should boost the observed masses by a factor of 
1 jp to obtain an unbiased estimate. For p = 0.9, this amounts to an increase in B M \ N of magnitude AB M \ N = 0.1. Now, Rozo 
et al. (2006) showed that the purity of the maxBCG cluster sample is expected to be above 90% over the range or richnesses 
considered here, and the increase in B M \ N quoted above is undoubtedly an overestimate of the necessary correction since even 
false cluster detections will have excess mass associated with them. In light of this, we have adopted a one-sided systematic error 
bar AB M \ N = 0.08 to take into account the impact of purity in the recovered M-N relation. The error bar is one sided since we 
expect impurities will tend to decrease the observed mean mass. We can, however, turn this prior into a normal double-sided 
prior by including a systematic correction AB M \ N = 0.04 to the central value, and setting the systematic error bar to the same 
magnitude as the central value shift. We can also get a rough estimate for the systematic error on the purity by assuming that the 
quoted systematic error in the amplitude should be made only in the limit of high or low richness. If that were the case, using the 
fact the slope is measured over a decade of richness values, the corresponding slope would be 

1.061n(10) + 0.08 
ln(10) 
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which amounts to a systematic offset Aa = 0.03. These systematic error bars are added linearly to our previous systematic error. 
Our final set of priors for the M-N relation is 

B M \ N = 0.95 ± 0.07 (stat) ±0.10 (sys) (A20) 

a M \ N = 1.06 ± 0.08 (stat) ± 0.08 (sys) (A21) 

with a correlation coefficient r = 0.45 between the two statistical errors. 

Lx~N Priors 

The priors in the Lx~N relation come from repeating the analysis described in Rykoff et al. (2008b), but with Lx defined as the 
X-ray luminosity in the 0.5-2.0 keV band, and corrected for aperture effects. As in Rykoff et al. (2008b), we restrict this analysis 
to clusters with N > 30. We begin by measuring the stacked mean Lx~N relation and scatter on a fixed 1 /r'Mpc scale 

B L]N = 1.69 ±0.04 (st at) (A22) 
a L \ N = 1.63 ±0.06 (st at) (A23) 
cr L|A , = 0.84 ± 0.03 (stat) (A24) 

where we have measured Lx in units of 10 43 ergs x s" 1 , with a pivot point of N = 40. We emphasize that the scatter determined 
above is the total scatter in the observed Lx~N relation that cannot be attributed to Poisson uncertainties in the ROSAT photon 
counts. In particular, the quoted scatter is affected by possible point source contamination, AGN activity, cool cores, cluster 
mergers, etc. 

There are multiple systematic errors that can affect the derived parameters for the Lx~N relation. These include photometric 
redshift errors, evolution of the richness parameter N, uncorrected point sources, cluster mis-centering, and cluster AGN and 
cool cores. In addition, we need to account for the fraction of cluster flux lost due to our finite aperture and the RASS PSF, in 
order to compare our results with the luminosity measurements of Vikhlinin et al. (2008). We shall now discuss each of these 
possible systematic effects. 

Rykoff et al. (2008b) find that the accuracy of the maxBCG photo-z estimates is high enough such that any biases are insignif- 
icant relative to the statistical uncertainty of the parameter determinations, and can thus be safely ignored. However, Rykoff 
et al. (2008b) did find significant redshift evolution in the Lx~N relation, well above the expected self-similar evolution. Similar 
redshift evolution is found in Becker et al. (2007); the reason for the systematic undercounting of cluster members at high redshift 
is explained in Rozo et al. (2008). We have estimated the effect of this redshift evolution on our derived scatter parameter via a 
simple Monte Carlo, and confirm that although the apparent evolution is strong, it is insignificant relative to the intrinsic scatter. 
Therefore, we may also safely ignore this possible systematic effect. 

We now take a combined approach to the systematic effects due to cluster mis-centering, a finite aperture, the RASS PSF and 
uncorrelated point sources. The first three effects are strongly related, in that they all tend to scatter cluster photons out of our 
initial fixed l/i _1 Mpc aperture, and these may affect the normalization, slope, and scatter in the Lx~N relation. Uncorrelated 
point sources should not affect the mean relation because the large number of stacked sources smooths out the foreground and 
background. However, when uncorrelated point sources are aligned with individual clusters they may increase the measured 
scatter by boosting the apparent Lx- 

We have estimated the effects of these systematics by running a Monte Carlo with simulated RASS data on top of random 
backgrounds selected from the area of the RASS photon map that overlaps with the maxBCG mask. We first resample the 
maxBCG richness function 100 times. Each cluster is given a redshift drawn from the maxBCG redshift distribution, as well as 
a random postion on the sky selected from the area of the RASS survey that overlaps with the maxBCG mask. After we select 
the richest 1000 clusters in each realization, each cluster is given a luminosity based on the mean relation from Eqn. A22 and an 
input intrinsic scatter, a m = {0.0,0.2,0.4,0.6,0.8, 1.0}. Each cluster luminosity is then converted to a number of photon counts 
according to the RASS exposure at the given point, and scattered by Poisson uncertainties. Then, each cluster is given a position 
offset according to the maxBCG miscentering distribution described in Johnston et al. (2007, see § 4.3). The cluster profiles are 
assumed to follow a (3 model, S(R) = S Q (1 +# 2 /#c)~ 3/3+1/2 - To 

ensure we are on similar footing as Vikhlinin et al. (2008), we 
randomly assign each cluster (3 model parameters uniformly in the range 0.6 < (3 < 0.7 and 0.05 <Rc < 0. 15 /r'Mpc. Finally, the 
photons are scattered according to the RASS PSF, following the method of Rykoff et al. (2008b, see § 3.3.1). We then calculate 
the stacked mean relation and scatter as described in Rykoff et al. (2008b). 

Figure Al summarizes the results from our systematic tests. The x-axis shows the input intrinsic scatter, <t,„. The y-axis shows 
the ratio of the input parameter to output parameter for the normalization B L \ N (circles), slope a L \ N (diamonds), and scatter a L \ N 
(squares). We note that when er„, = 0.0 then a out = 0.31 ±0.04, which cannot be displayed on the plot. This is consistent with 
our expectation that uncorrelated sources may boost the observed scatter, especially with low intrinsic scatter. Overall, we find 
that (a) the slope a L \ N is not significantly biased; (b) at moderate to large scatter (cr„, > 0.5) the intrinsic scatter a L \ N is not 
significantly biased; and (c) the output normalization B L \ N must be boosted by a factor of 1 .20 ± 0.05 to account for the flux lost 
to miscentering, the finite aperture, and RASS PSF effects. Our priors become then 

B L]N = 1 .87 ± 0.04 (stat) ± 0.05 (sys) (A25) 
a L \ N = 1. 63 ± 0.06 (stat) (A26) 
a L{N = 0.84 ± 0.03 (stat). (A27) 

In addition to these corrections, we also need to take into account systematic uncertainties due to purity and completeness in 
the sample. Just as with the weak lensing mass estimates, completeness should not affect the measured Lx~N relation, whereas 
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FIG. Al. — Results from systematic error Monte Carlo tests. The x-axis shows the input intrinsic scatter, tr,„. The y-axis shows the ratio of the given 
input parameter to output parameter for the normalization B L \ N (circles), slope a L ^ N (diamonds), and scatter (T L i N (squares). We note that when <r,„ = 0.0 then 
a out = 0.31 ± 0.04, which cannot be displayed on the plot. 



purity will tend to suppress the X-ray luminosity at fixed richness. Following the same procedure as in appendix A. 2, we derive 
systematic errors AB L \ N = 0.04 and Aq^ai = 0.05, which we add linearly to our previous systematic error estimates. Finally, we 
have repeated our scatter analysis using not just the 1000 richest clusters, but also the 2000 richest clusters, in which case we find 
clI/v = 0.95. To take into account this variation in our analysis, we also introduce a systematic error Aa L \ N = 0.10. Our final set 
of priors is 

B L \ N = 1 .91 ± 0.04 (if at ) ± 0.09 (sys) (A28) 
a L \ N = 1 .63 ± 0.06 (st at ) ±0.05 (sys) (A29) 
a L \ N = 0.84 ± 0.03 (stat ) ± 0. 10 (sys). (A30) 



L x -M Priors 

As discussed in section 3, our analysis hinges on the fact that we can use prior knowledge about the Lx —M relation to constrain 
the M-N relation. Here, we use the results of Vikhlinin et al. (2008) to put priors on the Lx~M relation, which may be 
summarized as 17 

A L]M + 1 .361a L | M + 1 .5(c^| M -0.40 2 ) = 2.59 ± 0.08 (A31) 

a L \ M = 1-61 ±0.14 (A32) 
g l \ m = 0.40 ±0.04. (A33) 

We report a prior on A L | M + 1.361 a/,|M+1.5(cr^| M -0.40 2 ) because atM= 1O 14 M the L^-M parameters derived from the Burenin 

et al. (2007) sample are correlated. To decouple them, one needs to shift to the statistical pivot point M = 3.9 x 10 14 Mq and 
introduce the scatter dependence quoted above (Vikhlinin, private communication). These constraints are derived from Chandra 
observations of clusters in the 400d cluster catalog (Burenin et al. 2007), which allowed Vikhlinin et al. (2008) to measure Yx and 
thereby infer cluster mass using the M-Yx relation. This relation was itself calibrated on a cluster subsample for which masses 
were derived using the standard hydrostatic equilibrium argument. This last point is very important, since simulations suggest 
that hydrostatic mass estimates of clusters may be biased low by w 10% -30% (see e.g. Evrard 1990; Rasia et al. 2006; Nagai 
et al. 2007). One way to calibrate such uncertainties is to compare weak lensing mass estimates to hydrostatic mass estimates. 
There are several examples of this type of approach. For instance, Vikhlinin et al. (2008) have performed such an analysis using 
the weak lensing mass estimates of Hoekstra (2007), and find M w i = (1 .09 ± 0. 1 l)Mx- A similar analysis has been carried out by 
Mahdavi et al. (2008), who used the weak lensing mass estimates of Hoekstra (2007) and their own analysis of Chandra public 
data to obtain M,, / = (1.28±0.15)Mx. Finally, using XMM X-ray observations and the weak lensing data of Bardeau et al. (2005), 
Bardeauet al. (2007a), andDahle (2006), Zhang et al. (2008) find M wl = (1.21 ±0.1 3)M X . Zhang et al. (2008) also note, however, 
that a histogram of M w \ / Mx peaks at a ratio of 1 .00 ±0.05, and that clusters in the tails of the distribution tend to have tight error 
bars, possibly biasing the error weighted ratio. In light of this, we have opted for a "middle of the road" approach, and introduce 
a correction factor 1.15 =b 0.15. Our corresponding prior is 

A L | M +1.361a L | M +1.5(cr 2 | M -0.40 2 ) = 2.45 ±0.08 (stat)± 0.23 (sys) (A34) 

17 We have included the appropriate evolution correction for a median redshift z = 0.23, as appropriate for the maxBCG sample. 
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maxBCG Mass Function Data 
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Mean and covariance matrix of the maxBCG mass function. Masses are defined using an overdensity of 500 relative to critical, and are measured in units of 
10 14 Mq. Space densities are measured in units of Mpc~ 3 . Diagonal terms in the covariance matrix above are set to y/Qj/ {«;), and thus represent the fractional 
uncertainty in the halo space density. Off diagonal terms contain the correlation coefficient r, j = Cy/ ^fC[]Cj~j between the various bins. The median redshift of 
the sample is z = 0.23. 



a L | M = 1.61±0.14(s/aO (A35) 
o m = 0.40 ±0.04 (st at). (A36) 

Estimating systematic errors in a L \ M and a L \ M is difficult. For instance, comparisons with weak lensing masses are not an 
effective way of assessing systematics because weak lensing mass estimates are so noisy: trying to fit a power law relation 
between M w \ and Mx results in very large errors for the slope of the relation. 

One alternative is to consider multiple studies of the Lx~M relation in order to asses how sensitive the recovered parameters 
are to the analysis pipeline. Unfortunately, such an excercise is far from trivial. One difficulty is the fact that there is very little 
agreement on the meaning of Lx, with many works focusing on core-excised and/or core-corrected bolometric X-ray lumunisoties 
(e.g. Bardeau et al. 2007b; Zhang et al. 2007, 2008). Even among those works that also explore the Lx —M relation when Lx is a 
soft X-ray band luminosity (e.g. Reiprich & Bohringer 2002; Maughan 2007), there are still important differences in the aperture 
used to estimate Lx- In principle, we could attempt to convert between the various definitions of Lx to try to compare the works 
against each other, but many of these Lx~M measurements are affected by Malmquist bias, making comparisons to the Vikhlinin 
et al. (2008) results difficult. 

One work that does constrain the the soft X-ray band, non-core excised, Malmquist bias corrected L x -M relation is Stanek 
et al. (2006). Unfortunately, the energy band they use is slightly different from that of of Vikhlinin et al. (2008), so even here 
comparison is not trivial. We expect, however, that at least the scatter and slopes of the Lx~M relation will not be strongly affected 
by the minor differences between the two Lx definitions. Given our purposes, the interesting thing about the Stanek et al. (2006) 
results is that they use a very different methodology for constraining the Lx~M relation. In particular, they assume knowledge 
of cosmological parameters, and then use the observed cluster X-ray luminosity function to constrain P{Lx\M). Assuming their 
"compromise cosmology", which they argue gives the best results, they find a x \M = 1-60 ±0.05 and cr L [ M = 0.34 ±0.10. These 
values are in excellent agreement with those of Vikhlinin et al. (2008), and suggest that placing additional systematic errors in 
the Lx~M parameters is not really necessary at this point. 

MASS FUNCTION DATA 

Table Bl presents the mean and covariance matrix of the mass function data derived from our analysis. These results represent 
the state of the art mass function measurements at low redshift from optically derived cluster catalogs. We emphasize we assumed 
fl m = 0.27 and h = 0.71, so appropriate rescaling must be applied if the results are to be compared against significantly different 
cosmologies. Note that the covariance matrix data in table Bl is normalized such that the diagonal entries are the fractional error 
y^Cij/ («/), while the off diagonal entries are the correlation coefficients r, y = C, \jj y/CijCjj. We present the data in this way 
since it is easier to understand when expressed this way. The actual values for the covariance matrix are easily reconstructed from 
the data in the table. 



